Gamified online surveys: Assessing experience with self-determination theory

We developed four online interfaces supporting citizen participation in decision-making. We included (1) learning loops (LLs), good practice in decision analysis, and (2) gamification, to enliven an otherwise long and tedious survey. We investigated the effects of these features on drop-out rate, perceived experience, and basic psychological needs (BPNs): autonomy, competence, and relatedness, all from self-determination theory. We also investigated how BPNs and individual causality orientation influence experience of the four interfaces. Answers from 785 respondents, representative of the Swiss German-speaking population in age and gender, provided insightful results. LLs and gamification increased drop-out rate. Experience was better explained by the BPN satisfaction than by the interface, and this was moderated by respondents’ causality orientations. LLs increased the challenge, and gamification enhanced the social experience and playfulness. LLs frustrated all three needs, and gamification satisfied relatedness. Autonomy and relatedness both positively influenced the social experience, but competence was negatively correlated with challenge. All observed effects were small. Hence, using gamification for decision-making is questionable, and understanding individual variability is a prerequisite; this study has helped disentangle the diversity of responses to survey design options.


Introduction S12. Additional question about relatednedd to nonplayer characters (RQ6)
Finally, we focused on nonplayer characters (NPCs), because our previous gamification prototypes failed to create relatedness to them (Aubert & Lienert, 2019;Aubert, Lienert, et al., 2022). We found almost no studies that investigated nonplayer characters per se (Mekler et al., 2017;Wee & Choong, 2019;Xi & Hamari, 2019), despite the proposition that a virtual relationship with a NPC can satisfy relatedness (Ryan & Rigby, 2019;Ryan & Deci, 2017a). Published results pertinent to NPCs suggest that a meaningful story with avatars and teammates should satisfy the need for social relatedness (Sailer et al., 2017). Design of NPCs can be very demanding. For instance, in video games, artificial intelligence is used to adapt the NPC to the player (e.g., Kopel & Hajas, 2018). Others have already asked how computer-generated personalities and artificial intelligence can satisfy the need for relatedness (Przybylski et al., 2010). This leads to our final research question: RQ6: Can respondents relate to our relatively simple nonplayer characters?

S2. Description of the objectives
The ten objectives were organized in a four-branch hierarchy of objectives. Aubert, Scheidegger, Schmid Suppl. Material: Gamified online surveys: Assessing experience with self-determination theory PlosOne, DOI: 10.1371/journal.pone.0292096

High removal of organic matter
The chemical formula of nitrogen in the icon stands for the removal of nitrogen compounds present in wastewater. Nitrogen compounds in the wastewater come from organic waste products (e.g. food waste and excrement); some of them can kill fish. The objective is expressed as a percentage (%) of removal of the nitrogen compounds present in the wastewater.
In the best case, the systems remove 90 % of these nitrogen compounds, in the worst case only 88 %.

High removal of micropollutants
One aspect of wastewater treatment is the removal of existing micropollutants due to pharmaceutical residues, hormones, cosmetics, etc. For this reason, the logo of this objective is a tablet. These substances pollute water bodies and affect aquatic life. For example, they can disrupt fish reproduction. The objective is expressed as a percentage (%) of the removal of the micropollutants present.
In the best case, 86 % of these micropollutants can be removed, in the worst case only 12 %.

High recovery of phosphorus
Wastewater contains phosphorus, which recovery and recycling are important. The phosphorus that is found in urine is an important natural and locally-produced fertiliser, as symbolised by the plant depicted on the logo. The objective is expressed as a percentage (%) of phosphorus recovery.
In the best case, 81 % of the dissolved phosphorus in wastewater can be recovered; in the worst case, no phosphorus is recovered at all (0 %).

Low use of water
Wastewater treatment systems have to cope with a more or less large amount of water. Saving water is extremely important because this resource is becoming increasingly scarce and expensive. For this reason, the logo is a drop of water. The objective is expressed in liters per person per day (L/P*day).
Toilet flushing alone consumes 27 L/P*day in the worst case, which could be reduced to 0 L/P*day in the best case!

Low net energy consumption
Overall, wastewater treatment requires a lot of energy. The lightning bolt on the logo symbolises this. Only if we save energy can we reduce greenhouse gas emissions and counteract climate change! The objective is expressed in kilowatt hours per person per year (kWh/P*year).
In the worst case, the energy consumption is 280 kWh/P*year. This corresponds to ten 100-watt light bulbs burning for 280 hours (almost twelve days). In the best case, 15 kWh/P*year (ten 100-watt light bulbs burning for 15 hours (not even one day)).

High health for user
The cross on the logo symbolises health protection. Due to the human excrement contained in wastewater, wastewater contains countless bacteria. The more one comes in contact with these germs, the greater the risk of a stomach or intestinal infection. The objective is expressed as the number of contacts with parts of the sewage system per year (contacts per year).
In the best case, there are no contacts (0 contacts) with such bacteria per year. In the worst case, this number is 5 contacts per year.

High attractiveness for user
Attractive toilets play a role in our sense of well-being. The logo stands for clean, odourless toilets. The sight and smell of sewage and excrement usually disgusts us. The goal is indicated on a 10-point scale from 0 (unattractive) to 10 (attractive).
In the best case, an option can score 9 out of 10 points, in the worst case 3 out of 10 points.

Low time demand for user
Another aspect of wastewater treatment is the time factor: the process should cost the users as little time as possible. This is why an hourglass is depicted on the logo. Some systems are not only controlled by professionals, but require additional operation, maintenance or control measures on the part of the users, which takes up valuable time.
The objective is expressed in hours per person per year (h/P*year).
In the best case, users spend no time on this (0 h/P*year), in the worst case 24 h/P*year. Hochgerechnet auf 20 Jahre, kostet ein System im besten Fall 433 CHF/P*Jahr, im schlechtesten Fall 636 CHF/P*Jahr.

Low annual operating and maintenance costs
The annual operating and maintenance costs should be as low as possible. Therefore, the logo for this objective embodies money. Wastewater treatment requires human and material resources. The associated costs influence taxpayers' contributions. The objective is expressed in Swiss francs per person per year (CHF/P*year).
Extrapolated over 20 years, a system costs CHF 433/P*year in the best case and CHF 636/P*year in the worst case.

High flexibility (inter-generation fairness)
The systems used should be flexible so that future generations can easily adapt them. The logo embodies these generations. A shorter lifespan is better for flexibility and adaptation than a longer lifespan. The objective is expressed in number of years.
Flexible systems have a remaining lifetime of 5 years after 2040 in the best case, while inflexible systems have up to 25 years (worst case).

S3. Description of the alternatives
In the following

Rehabilitation of the local wastewater treatment plant
To extend the service lifetime of the local wastewater treatment plant (WWTP), rehabilitation is necessary. It is not necessary to switch to new technologies in the process.
The sewer system (no. 1 on the picture), which carries away urine, faeces and household wastewater, as well as the sewer system (2), which is several kilometers long, must also be rehabilitated. All sewage is still treated in the local WWTP (3) at the edge of the village and the treated water is discharged into the river. In rural areas, such rehabilitations are mainly financed by the municipalities themselves, as the subsidies granted by the federal government in the 1960s and 1990s as an incentive to build WWTPs have not been renewed.

Connection to the regional ARA
The village's existing sewer network will be rehabilitated and connected to the sewer network of the nearest town. In future, the wastewater (1) (urine, faeces and mixed household wastewater) will be channeled through several dozen kilometers of sewers (2) to a larger wastewater treatment plant (3). This plant has a high treatment capacity and uses modern technologies. The local WWTP will be shut down and dismantled, while the large treatment plant will be used by both the city and the surrounding villages.

Decentralised package plants
These small wastewater treatment plants (package plants) are monitored by the householders themselves and serve to treat the wastewater of individual or several households. Urine and faeces are collected via conventional toilets and fed with the household wastewater to a package plant, where they are treated using natural or technical processes. In natural treatment, the wastewater previously stored in a tank (1) is purified by microorganisms in a reed bed ecosystem (2). In technical treatment, the wastewater is purified in tanks (3) in the basements or underground in the respective home gardens. In both cases, the treated wastewater is then infiltrated into the garden (4) or discharged into a river (5). The solid wastewater components, so-called sewage sludge, is transported by truck to a large wastewater treatment plant (6) for processing.

klARA with urine separation
The package plant for single households is combined with modern technologies for urine separation in the toilet bowl (1), for example with special NoMix toilets. Additional urine collection tanks (2) are installed in the basement or in the garden. Once a year, the collected urine is transported to a fertiliser factory (3). The faeces and household wastewater are fed into a package plant (4). The treated wastewater is infiltrated in the garden (5) or discharged into the nearest river (6), while the sewage sludge is transported to a largescale treatment plant (7) for processing.

klARA with urine and faeces separation
This decentralised package plant system is based on the use of dry toilets (1) without flushing: urine and faeces are separated directly in the toilet bowl. The urine is stored on site in a tank (2) and then transported to a fertiliser factory (3). The faeces are stored in a tank (4) and then transported to a composting plant (5) to be processed as fertiliser. The household wastewater, e.g. from the shower, is stored and treated on site in a package plant (6). It is then infiltrated into the garden (7) or discharged into the nearest river (8). The sewage sludge is transported to a large-scale treatment plant (9) for processing.

Construction of a new local wastewater treatment plant with urine separation.
The existing WWTP is dismantled; a new WWTP is built on the same site. Modern technologies (NoMix toilets (1)) are installed in all households; this allows urine to be separated from faeces and household wastewater. Tanks (2) in the basement or garden store the urine and are emptied once a year. The urine is transported to a fertiliser factory (3). The faeces and household wastewater are transported via the local sewage system (4) to the new WWTP (5), where they are purified. General information about the wastewater management alternatives.
Initial ranking in order of preference of the wastewater management alternatives (drag and drop).
A meeting is set to the next day: the candidates wish to read the article before publication.

Dialogue with the activists
The ball is in your court! Talk to each of the 10 activists to find out what their motivations and goals are.

Map of New Waterton (Neuwässerli)
The citizens were advocates for one objective.
They explained what their objective mean, why it is important, what the range of variation between the different options is. The citizens carried a pictogram on their tee-shirt representing the objective.
The players chose the order in which to meet with the citizens of New Waterton.

Debates at the Bargain
The evening has finally come. Your election event begins. You meet all the activists from the afternoon again. They will once again point out their goals in the structured debates and try to convince you of their proposals. And you, dear election candidate, will not be able to avoid compromising.

(only if learning loop) Trade-off weight elicitation
Verbal jousting: in duels, advocates presented contrasted situations, and asked the players which they preferred. In order not to upset any advocate, the players then had to propose a compromise situation, i.e. find an indifference point between the situations of the two advocates. Specifically, the players adjusted the least preferred situation by improving the fulfillment of the most important objective until indifference. This step corresponded to the trade-off method.
Aubert, Scheidegger, Schmid Suppl. Material: Gamified online surveys: Assessing experience with self-determination theory PlosOne, DOI: 10.1371/journal.pone.0292096 16 Enjoying the evening After the debates, it is time to have a drink -especially water, but it can be something else! As it is your election event, you are allowed to serve the drinks.

Swing weight elicitation
The players offered drinks to the advocates proportionally to their affinity with them (i.e. how they preferred the objectives). The players first selected the objective that should be improved from worst to best as first priority. Then, relatively to this chosen one, the players rated the importance of improving the other objectives from worst to best.

The hour of truth
At the end of the evening you are alone with the bartender. He has recorded your opinion exactly. If inconsistency between the sets of weights, repeat either one of the method.
After one repetition, it is possible to "move on anyway", and decide for one of the two sets of weights or confirm that none represent the preference (equal weights are given).

Taking stock
It was a good but short night. It is time to take stock of your day yesterday and analyze the consistency of your opinions together with the engineer. The journalist can thus still make adjustments in his article and publish it -this promises to be a decisive factor in your election campaign! Hypothetical alternatives as defined in swing were simplified to the objective that is improved from worst to best (without specifically mentioning the other objectives that remain at the worst level).
In the first step of swing (ranking), respondents did not rank ordered the hypothetical alternatives/objectives but only selected the one that is most important to improve from worst to best.
In the second step of swing (relative scoring), respondents relatively scored without numerical values, but visually. Respondents filled in the glass of the advocating citizens relatively to the full glass of the citizen advocating for the most important objective (as defined in the first step).
(Instead of scoring from 0 to 100).

S4.4. The learning loops
There were two learning loops: the "internal" loop on objectives was embedded in the "outer" loop on options. The weights were considered consistent if they differed by less than 10% for each objective between the swing and trade-off methods. Else, the respondents could repeat either one of the method. In case weights remained inconsistent, the respondents could continue by indicating which weights represented their preference, or continue anyway (the software gave per default equal weights). We followed standard swing and trade-off methods for weight elicitation, which are somewhat complex and repetitive (e.g., Eisenführ et al., 2010). The swing method is composed of two steps: first, the respondents rank hypothetical alternatives from most preferred to least preferred, and second, they assign a score from 100 (most preferred) to 0 (least preferred) to the hypothetical alternatives. The hypothetical alternatives in swing have all objectives at their worst levels, but one which is at its best. The trade-off method comprises two steps repeated n-1 times (with n the total number of objectives). First, the respondents make a preference judgment between a pair of alternatives differing in two objectives, one being at the worst level and the other at the best level. Second, they adjust the fulfillment of the least preferred alternative to reach indifference.
The two methods used for the learning loop on options were the initial ranking of options, and the ranking of options calculated from the elicited weights after aggregation with an additive model (see textbook Eisenführ et al., 2010, for additive aggregation model with multi-attribute value theory). The rankings of options were considered consistent if the ranking of all six options were the same. Else, respondents had to rank the options again, after being informed of how well each option fulfills the objectives. In case the rankings remained inconsistent, the respondents could repeat the weight elicitation (re-entering the internal loop), or continue by indicating which ranking represented their preferences best.

S5. Sample characteristics Ordered from the market research company
Defined in the quote with the market research company with quotas "national representative according to sex and age (in 4 categories) for all subsamples", based on statistics from the Bundesamt für Statistik, BfS). We compared the main samples (noLL vs. LL and nogam vs. gam) based on their age with a Welch Two Sample t-test.

Age x Sex
 The age distributions in noLL and LL were the same (t(766.99) = 0.025, p = .980, r = .00).
 A Levene's test showed, that the variances in age were similar for all subsamples (F(3,765) = 0.832, p = .477, ω 2 = -.001). There was no significant difference between the ages of the subsamples.  There was no significant difference in gender in the groups of LL and noLL (Χ 2 (1) = 0.06, p = 0.804, V = 0).
 There was no significant difference between the distribution of gender in the subsamples (Χ 2 (3) = 0.44, p = .932, V =.02). We compared the main samples (noLL vs. LL and nogam vs. gam) based on their education with a Chisquared test.
 There was no significant difference in education in the groups of nogam and gam (Χ 2 (4) = 3.79, p = .435, V = .07) We compared the smaller subsamples (nogam-noLL, nogam-LL, gam-noLL, gam-LL) based on their education with a Chi-squared test with simulated p-value (based on 2000 replicates, because not all expected values were >5).
 There was no significant difference between the distribution of education in the subsamples (Χ 2 (12) = 8.978, p = .704, V =.06). These items pertain to a series of hypothetical sketches. Each sketch describes an incident and lists three ways of responding to it.
Please read each sketch, imagine yourself in that situation, and then consider each of the possible responses. Think of each response option in terms of how likely it is that you would respond that way. a) "I can't do anything right," and feel sad.
b) "I wonder how it is I did so poorly," and feel disappointed.
Während Sie sich auf den Abend freuen, erwarten Sie wahrscheinlich, dass: b) The other person probably "did the right things" politically to get the job. c) You would probably take a look at factors in your own performance that led you to be passed over. [gcos10] 10. Sie starten eine neue Karriere. 11. A woman who works for you has generally done an adequate job. However, for the past two weeks her work has not been up to par and she appears to be less actively interested in her work.
The following questions are about your feelings towards your fellow Swiss citizens who are facing a decision on wastewater management. In the following, they will be referred to as 'citizens' for short.
To what extent are the following statements true for you?
Scale: Same as Autonomy   Given our measured range of orientation scores for the three constructs, the predictions showed that answering the gamified interface predicted a better experience than answering the nongamified interface.

ID Original Adapted question in English German
Considering the autonomy orientation revealed that high scores on autonomy-orientation scale predicted more positive mean experience than low autonomy-orientation scores, particularly for the respondents receiving the interface with learning loop (Fig.7, top panel, in the main text).
High scores on controlled-orientation scale predicted a slightly better mean experience than low controlled-orientation scores, independently of the interface (Fig.7, middle panel). The scores on controlled-orientation did not influence how interface was experienced.
Respondents with high impersonal scores receiving the nongamified interface with learning loop tended to have a worsened experience.
positive" (positive, SM S7610), and "it makes sense that animals and not people were used, but it's about an important topic and I don't need children's characters" (both negative and positive).
Overall, the comments showed us that opinions about our design choices varied greatly. Some clearly enjoyed interacting with sympathetic and benevolent animal characters, while others did not identify with these childish characters. The latter group often mentioned that the topic was too serious for a game, suggesting that they would not appreciate any nonplayer characters of whatever style. Overall, qualitative data reinforces the observation of high individual variability in the perception of gamification.

Text S761. Additional examples of comments coded as neutral.
"It's okay", "no strong opinion about it"; "They were simple and did not distract from the content"; "They were presented very simply, without much distraction. It was easy to concentrate on the questions" Text S762. Additional examples of comments coded as unclear.
"simple display", "animal characters", "design"; "identification"; "visual design, small activities (e.g. blinking)" "Very childlike style"; "Not liked: Looked like characters from a children's cartoon"; "Very childlike, a bit boring, not so modern."; "Very childlike style"; "Tedious children's stuff"; "A bit too childish for me personally."; "characters too abstract -too childlike"; "It was very childlike and I didn't like that the heads were very big to the characters."; "It was presented too childishly for me, which is not a bad thing per se and it is certainly cheaper, but as I said, it felt like it was made for children."; "rather something for children"; "The figures are tailored to children" "The characters could have been made a little more colourful and …", "Too few different colours", "colourless", "… but a bit cheesy in terms of the colours" "I liked that they were animals (with human characters it leaves more room for appreciation).", "I also liked the fact that they were animals.", "Animals as characters are always good." Aubert, Scheidegger, Schmid Suppl. Material: Gamified online surveys: Assessing experience with self-determination theory PlosOne, DOI: 10.1371/journal.pone.0292096 81 S7.7. Additional analyses We explored the effect of adding the general causality orientation to our models. We observe that: (1) more of the variance is explained by including GCOS in the model, but (2) the effect is about the same (the coefficients do not change much).
The increase in the explained variance is not surprising though: the more information is available about the participants, the better the model can predict how gamification is experienced. However, in practice, this information about participants is not readily available for the survey designers.
Models presented in the main text Same models, adding the GCOS

S8. Discussion
Proposition S81. Causality Orientation theory "All individuals have all three causality orientations to some degree." (p.234) "Subtle cues in the environment may make different orientations more salient at that time and place. Thus, it is possible to prime people's motivational orientations such that their behavior and experience will be significantly affected by the primed motivation even if that orientation is, in general, relatively weak". Our gamification with simple NPCs satisfied the need for relatedness and increased the perceived social experience. This is an improvement on our previous study (Aubert, Lienert, et al., 2022). However, not all respondents liked the NPCs, and some who liked them were also critical. This decoupling of social experience from relatedness to the NPC confirms that, in gamified interfaces about societal issues, relatedness is composed of (1) in-game relatedness with NPCs and (2) broader relatedness with society (Aubert, Lienert, et al., 2022). In addition, the written qualitative comments might be biased: Previous research has reported that respondents receiving gamified surveys provide richer and more positive answers to open-ended questions than respondents receiving the control treatment (e.g., Bailey et al., 2015). Therefore, we acknowledge that the design choices were equivocal even though our gamification with interactions with simple NPCs satisfied the need for relatedness. However, design appreciation is subjective: Some appreciated the friendly animal-like characters while others complained that they were too childish. Polarized feedback on gamification, ranging from very positive to very negative, has already been reported (Keusch & Zhang, 2015); gamification is very far from being the subject of consensus.
Further research could also focus on the choice of avatar representing the respondent in the gamified survey. Avatars have been shown to trigger affective and cognitive reactions (Triantoro et al., 2019). However, to the best of our knowledge, we are not aware of research measuring the degree of identification with role in a gamified story, or with the avatar chosen by the respondent. Investigating how respondents identify with characters with quantitative instruments would provide clearer insight than the conflicting qualitative data we collected. Nonetheless, our qualitative data did highlight the relevance of identification with NPC and avatar.
Finally, future research could continue investigating how to increase relatedness to the nonplayer characters, and whether some features would make them more broadly liked.